Graphical viewer for biomolecular sequence data

ABSTRACT

Disclosed are methods, media and systems for graphically displaying computer-based biomolecular sequence information. Generally, biomolecular sequence information may be graphically depicted in a variety of different forms in accordance with the present invention. The sequence information may be composed of nucleotide or amino acid sequence information or both. The graphical depictions may be in several different formats providing different information relating to the sequences, and may be displayed in one or more screens of a computer user interface.

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application is a continuation-in-part of co-pendingapplication Ser. No. 08/856,647, flied May 15, 1997, entitled DATABASEAND SYSTEM FOR DETERMINING STORING AND DISPLAYING GENE LOCUS INORMATION,and application Ser. No. 08/857,382, filed May 15, 1997, entitledDATABASE AND SYSTEM FOR STORING, COMPARING AND DISPLAYING GENOMICINFORMATION, the disclosures of which are incorporated by referenceherein for all purposes.

BACKGROUND OF THE INVENTION

[0002] The present invention relates generally to the field ofbioinformatics. In particular, the invention relates to methods, mediaand systems for graphically displaying computer-based biomolecularsequence information.

[0003] Informatics is the study and application of computer andstatistical techniques to the management of information. Bioinformaticsincludes the development of methods to search computer databases ofbiomolecular sequence information (e.g., nucleic acid and protein)quickly, to analyze and display biomolecular sequence information, andto predict protein sequence, structure and function from DNA sequencedata.

[0004] Increasingly, molecular biology is shifting from the laboratorybench to the computer desktop. Today's researchers require advancedquantitative analyses, database comparisons, and computationalalgorithms to explore the relationships between sequence and phenotype.Thus, by all accounts, researchers cannot and will not be able to avoidusing computer resources to explore gene sequencing, gene expression,and molecular structure.

[0005] One use of bioinformatics involves studying an organism's genometo determine the sequence and placement of its genes and theirrelationship to other sequences and genes within the genome or to genesin other organisms. Such information is of significant interest inbiomedical and pharmaceutical research, for instance to assist in theevaluation of drug efficacy and resistance. To make genomic informationmanipulation easy to perform and understand, sophisticated computerdatabase systems have been developed. Incyte Pharmaceuticals, Inc. ofPalo Alto, Calif., has developed several such databases, including somein which genomic sequence data is electronically recorded and annotatedwith information available from public sequence databases. Examples ofsuch public sequence databases include GenBank (NCBI) and SWISSPROT. Theresulting information is stored in a relational database that may beemployed to determine relationships between sequences and genes withinand among genomes.

[0006] While genetic data processing and relational database systemssuch as those developed by Incyte Pharmaceuticals, Inc. provide greatpower and flexibility in analyzing genetic information, furtherimprovements in these systems will help accelerate biological researchfor numerous applications.

[0007] One area of interest in this regard is the display ofbiomolecular sequence information. As noted above, an important goal ofgenome research to determine the sequence and placement of a organism'sgenes and their relationship to other sequences and genes within thegenome, to genes in other organisms, and to related protein sequences.The ability to clearly and effectively display gene loci information fora given organism or organisms would greatly assist this task.

[0008] Accordingly, the development of a display tool which allows auser to clearly and effectively display gene loci information for agiven organism or organisms and/or other biomolecular sequenceinformation is desirable.

SUMMARY OF THE INVENTION

[0009] The present invention meets this need by providing methods, mediaand systems for graphically displaying computer-based biomolecularsequence information. Generally, biomolecular sequence information maybe graphically depicted in a variety of different forms in accordancewith the present invention. The sequence information may be composed ofnucleotide or amino acid sequence information or both. The graphicaldepictions may be in several different formats providing differentinformation relating to the sequences, and may be displayed in one ormore screens of a computer user interface.

[0010] A graphical viewer in accordance with the present inventionpreferably has a plurality of panels, each panel displaying informationabout the biomolecular sequence data of interest in a different way on asingle screen or page. For example, a first panel could show a graphicalrepresentation of the entire biomolecular sequence, or the portion ofthe sequence of interest, with the locations of particular subsequencesof interest indicated. A second panel could show a more detailedgraphical representation of all or a selected portion of the sequencerepresented in the first window, allowing a user to focus on aparticular subsequence of interest. This second panel view could depictadditional information, such as annotations, relating to the particularsubsequences of interest. A third panel could show informationgraphically representing the confidence level or origination, forexample, of the biomolecular sequence data represented in one or more ofthe other panels. Additional panels on the same or additional screenscould show, for example, the actual nucleotide or amino acid sequence ofor relating to a selected subsequence of interest represented in one ormore of the other panels, or other information relating to thebiomolecular sequence data.

[0011] In one preferred embodiment, a graphical viewer in accordancewith the present invention provides a graphical representation of all ora selected portion of an organism's genome with its individual lociindicated. The viewer allows the user to focus on a particular region orlocus of internet and have it also be graphically represented withadditional information, such as annotations. A graphical depiction ofsequence coverage for the sequence regions represented in the viewer mayalso be provided.

[0012] The viewer may also provide for the display of related loci fromother portions of the organism's genome (i.e., paralogs), and allows forthe retrieval of information about the loci, such as actual nucleotidesequences or detailed annotations, from an associated relationaldatabase system. In addition, a graphical viewer in accordance with thepresent invention may provide for the graphical representation andcomparison of multiple portions of the genome of one or more organismsbased on a locus of interest and its corresponding paralogs and homologs(related loci from another organisms genome).

[0013] A graphical viewer in accordance with a preferred embodiment ofthe present invention preferably provides graphical representations ofthe genomic data in a plurality of panels, each panel displayinginformation about the genomic data of interest in a different way. In aparticularly preferred embodiment of the invention, the graphical viewerhas tree main panels on a single screen: a legend viewer, which showsthe entire portion of the genome under consideration; a target viewer,which allows a user to focus (“zoom in”) on areas of the genome portionof particular interest; and a sequence depth viewer, which containsgraphical information illustrating the depth of coverage over the lengthof the genome portion under consideration.

[0014] In one aspect, the present invention provides a methodimplemented in a computer system for presenting biomolecular sequencedata. The method involves retrieving biomolecular sequence data from adatabase in response to a user query, and graphically depicting elementsof the biomolecular sequence data in a user interface for the computersystem. The graphical depiction may include a plurality of panelsrepresenting different aspects of the biomolecular sequence data in asingle frame.

[0015] In a preferred embodiment, the biomolecular sequence data myinclude gene locus data and be graphically depicted in three panels, thefirst panel graphically depicting at least a portion of a contig and itsassociated loci, the second panel graphically depicting at least aportion of the contig depicted in the first panel and annotated lociassociated with the portion, and the third panel graphically depictinginformation indicating the number of sequencing operations conducted todetermine the sequence data depicted in the second panel. The thirdpanel may graphically depicts sequences used to assemble the portion ofthe contig depicted in the second panel, or depth of coverageinformation for the portion of the contig depicted in the second panel.

[0016] In another aspect, the invention provides another methodimplemented in a computer system for presenting biomolecular sequencedata. The method involves retrieving biomolecular sequence data for aplurality of homologous loci from a database in response to a userquery, and graphically depicting at least some of the homologous loci ina user interface for the computer system.

[0017] In yet another aspect, the invention provides a computer system.The computer system includes a database including biomolecular sequencedata, and a user interface. The user interface is capable of receiving aquery relating to the biomolecular sequence data, and graphicallydisplaying the results of the query.

[0018] In still another aspect, the invention provides acomputer-readable medium containing programmed instructions arranged tographically display biomolecular sequence data. The computer-readablemedium includes programmed instructions for retrieving biomolecularsequence data from a computer system database in response to a userquery, and graphically depicting elements of the biomolecular sequencedata in a user interface for the computer system.

[0019] These and other features and advantages of the present inventionwill be presented in more detail in the following specification of theinvention and the accompanying figures which illustrate by way ofexample the principles of the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

[0020]FIG. 1A is a block diagram of a client-server Intranet forproviding database services in accordance with one embodiment of thepresent invention.

[0021]FIG. 1B is a schematic representation of the various softwaredocuments and entities employed by the FIG. 1A client-server Intranet toprovide biological information in response to user queries.

[0022]FIG. 1C is a block diagram illustrating the accessibility ofgraphical viewer features in accordance with a preferred embodiment ofthe present invention in connection with a biomolecular sequencedatabase.

[0023]FIG. 2 is a screen shot (HTML page) depicting a Contig Resultspage for a graphical user interface of a genomic sequences databasesuitable for selecting a locus to be viewed with a biomolecular sequencegraphical viewer in accordance with one embodiment of the presentinvention.

[0024]FIG. 3 is a screen shot depicting a Locus Information page for agraphical user interface of a genomic sequences database suitable foraccessing a biomolecular sequence graphical viewer in accordance withone embodiment of the present invention.

[0025]FIG. 4A is a screen shot depicting a main page of a biomolecularsequence graphical viewer in accordance with one embodiment of thepresent invention.

[0026]FIG. 4B is a screen shot depicting a main page of a biomolecularsequence graphical viewer modified to illustrate the zoom feature inaccordance with one embodiment of the present invention.

[0027]FIG. 5A is a Selected Object Details window in accordance with oneembodiment of the present invention.

[0028]FIGS. 5B and 5C are screen shots depicting an alignment viewer inaccordance with one embodiment of the present invention.

[0029]FIG. 6 is a screen shot depicting a main page of a biomolecularsequence graphical viewer modified to illustrate the new startingcoordinate feature in accordance with one embodiment of the presentinvention.

[0030]FIG. 7 is a screen shot depicting a main page of a biomolecularsequence graphical viewer modified to illustrate the new ContigIDfeature in accordance with one embodiment of the present invention.

[0031] FIGS. 8A-8C are screen shots depicting pages of a biomolecularsequence graphical viewer illustrating a feature which displays anactual biomolecular sequence in accordance with one embodiment of thepresent invention.

[0032]FIG. 9 is a flow chart depicting a process flow by which genelocus information may be viewed with a biomolecular sequence graphicalviewer in accordance with a preferred embodiment of the presentinvention.

[0033] FIGS. 10A-10E are screen shots depicting the operation of amultiple organism biomolecular sequence graphical viewer in accordancewith one embodiment of the present invention.

[0034]FIG. 11 is a flow chart depicting a process flow by which multipleorganism gene locus information may be viewed with a biomolecularsequence graphical viewer in accordance with a preferred embodiment ofthe present invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

[0035] Reference will now be made in detail to preferred embodiments ofthe invention. Examples of the preferred embodiments are illustrated inthe accompanying drawings. While the invention will be described inconjunction with these preferred embodiments, it will be understood thatit is not intended to limit the invention to one or more preferredembodiments. On the contrary, it is intended to cover alternatives,modifications, and equivalents as may be included within the spirit andscope of the invention as defined by the appended claims. In thefollowing description, numerous specific details are set forth in orderto provide a thorough understanding of the present invention. Thepresent invention may be practiced without some or al of these specificdetails. In other instances, well known process operations have not beendescribed in detail in order not to unnecessarily obscure the presentinvention.

[0036] Introduction

[0037] The present invention provides methods, media and systems forgraphically displaying computer-based biomolecular sequence information.Generally, biomolecular sequence information may be graphically depictedin a variety of different forms in accordance with the presentinvention. The sequence information may be composed of nucleotide oramino acid sequence information or both. The graphical depictions may bein several different formats providing different information relating tothe sequences, and may be displayed in one or more screens of a computeruser interface.

[0038] A graphical viewer in accordance with the present inventionpreferably has a plurality of panels, each panel displaying informationabout the biomolecular sequence data of interest in a different way on asingle screen or page. For example, a first panel could show a graphicalrepresentation of the entire biomolecular sequence, or the portion ofthe sequence of interest, with the locations of particular subsequencesof interest indicated. A second panel could show a more detailedgraphical representation of all or a selected portion of the sequencerepresented in the first window, allowing a user to focus on aparticular subsequence of interest. This second panel view could depictadditional information, such as annotations, relating to the particularsubsequences of interest. A third panel could show informationgraphically representing the confidence level or origination, forexample, of the biomolecular sequence data represented in one or more ofthe other panels. Additional panels on the same or additional screenscould show, for example, the actual nucleotide or amino acid sequence ofor relating to a selected subsequence of interest represented in one ormore of the other panels, or other information relating to thebiomolecular sequence data.

[0039] In one preferred embodiment, a graphical viewer in accordancewith the present invention provides a graphical representation of all ora selected portion of an organism's genome with its individual lociindicated. The viewer allows the user to focus on a particular region orlocus of interest and have it also be graphically represented withadditional information, such as annotations. A graphical depiction ofsequence coverage for the sequence regions represented in the viewer mayalso be provided.

[0040] The viewer also may also provide for the display of related locifrom other portions of the organism's genome (i.e., paralogs), andallows for the retrieval of information about the loci, such as actualnucleotide sequences or detailed annotations, from an associatedrelational database system. In addition, a graphical viewer inaccordance with the present invention may provide for the graphicalrepresentation and comparison of multiple portions of the genome of oneor more organisms based on a locus of interest and its correspondingparalogs and homologs (related loci from another organism's genome).

[0041] A graphical viewer in accordance with a preferred embodiment ofthe present invention preferably provides graphical representations ofthe genomic data in a plurality of panels, each panel displayinginformation about the genomic data of interest in a different way. In aparticularly preferred embodiment of the invention, the graphical viewerhas three main panels on a single screen: a legend viewer, which alwaysshows the entire portion of the genome under consideration; a targetviewer, which allows a user to focus (“zoom in”) on areas of the genomeportion of particular interest; and a sequence depth viewer, whichcontains graphical information illustrating the depth of coverage overthe length of the genome portion under consideration.

[0042] Of course, as noted above, a graphical viewer in accordance withthe present invention may be used to display biomolecular sequenceinformation other than the gene locus information described withreference to the preferred embodiments of the invention describedherein. For example, a graphical viewer in accordance with the presentinvention may be used to display peptide or nucleotide sequenceinformation, and can be used to display actual sequences resulting fromcomparisons of sequences from, for example, a BLAST or FASTA search.

[0043] The Graphical Viewer Environment

[0044] As noted above, a graphical viewer in accordance with the presentinvention is preferably used in connection with a biomolecular sequencerelational database system, such as those developed by IncytePharmaceuticals, Inc. of Palo Alto, Calif., and described, for example,in patent application Ser. Nos. 08/947,845, 08/856,647, 08/811,758,08/812,290 and 08/857,382, the disclosures of which are incorporated byreference herein for all purposes. Data to be displayed by a graphicalviewer in accordance with the present invention is accessed from such adatabase system using techniques and commands well known to those ofskill in the art. FIGS. 1A and 1B and the associated descriptionprovided below provide a context in which a graphical viewer inaccordance with the present invention may operate.

[0045]FIG. 1A depicts a network system 130 suitable for storing andretrieving information in relational databases, such as those suitablefor supporting a graphical viewer in accordance with the presentinvention. Network 130 includes a network cable 134 to which a networkserver 136 and clients 138 a and 138 b (representative of possibly manymore clients) are connected. Cable 134 is also connected to afirewall/gateway 140 which is in turn connected to the Internet 142.

[0046] Network 130 may be any one of a number of conventional networksystems, including a local area network (LAN) or a wide area network(WAN), as is known in functionality for packaging client calls in awell-known format (e.g., URL) together with any parameter informationinto a format (of one or more packets) suitable for transmission acrossa cable or wire 134, for delivery to database server 136.

[0047] Server 136 includes the hardware necessary for running softwareto (1) access database data for processing user requests, and (2)provide an interface for serving information to client machines 138 aand 138 b. In a preferred embodiment, depicted in FIG. 1A, the softwarerunning on the server machine supports the World Wide Web protocol forproviding page data between a server and client. In this embodiment, aweb server 156 having URL and HTTP functionality communicates with aclient via the HTTP protocol.

[0048] Client/server environments, database servers, relationaldatabases and networks are well documented in the technical, trade, andpatent literature. For a discussion of database servers, relationaldatabases and client/server environments generally, and SQL serversparticularly, see, e.g., Nath, A., The Guide To SQL Server, 2nd ed.,Addison-Wesley Publishing Co., 1995 (which is incorporated herein byreference for all purposes).

[0049] As shown, server 136 includes an operating system 150 (e.g.,UNIX) on which runs a relational database management system 152, a WorldWide Web application 154, and a World Wide Web server 156. The softwareon server 136 may assume numerous configurations. For example, it may beprovided on a single machine or distributed over multiple machines.

[0050] World Wide Web application 154 includes the executable codenecessary for generation of database language statements (e.g., StandardQuery Language (SQL) statements). Generally, the executables willinclude embedded SQL statements. In addition, application 154 includes aconfiguration file 160 which contains pointers and addresses to thevarious software entities that comprise the server as well as thevarious external and internal databases which must be accessed toservice user requests. Configuration file 160 also directs requests forserver resources to the appropriate hardware—as may be necessary shouldthe server be distributed over two or more separate computers.

[0051] Each of clients 138 a and 138 b includes a World Wide Web browserfor providing a user interface to server 136, and including codenecessary to generate HTML pages. Through the Web browser, clients 138 aand 138 b construct search requests for retrieving data from a sequencedatabase 144 and/or a genomic database 146, for example. Thus, the userwill typically point and click to user interface elements such asbuttons, pull down menus, scroll bars, etc. conventionally employed ingraphical user interfaces. The requests so formulated with the client'sWeb browser are transmitted to Web application 154 which formats them toproduce a query that can be employed to extract the pertinentinformation from sequence database 144 or genomic database 146.

[0052] In the embodiment shown, the Web application accesses data ingenomic database 146 by first constructing a query in a databaselanguage (e.g., Sybase or Oracle SQL). The database language query isthen handed to relational database management system 152 which processesthe query to extract the relevant information from database 146. In thecase of a request to access sequence database 144, Web application 154directly communicates the request to that database without employing theservices of database management system 152.

[0053] The procedure by which user requests are serviced is furtherillustrated with reference to FIG. 1B. In this embodiment, the WorldWide Web server and/or executable Web application components of server136 provide Hypertext Mark-up Language documents (“HTML pages”) 164 to aclient machine. At the client machine, the HTML document provides a userinterface 166 which is employed by a user to formulate his or herrequests for access to database 146. That request is converted by theWeb application component of server 136 to a SQL query 168. That queryis used by the database management system component of server 136 toaccess the relevant data in database 146 and provide that data to server136 in an appropriate format Server 136 then generates a new HTMLdocument, possibly through the Web application 154, relaying thedatabase information to the client as a view in user interface 166.

[0054] While the embodiment shown in FIG. 1A employs a World Wide Webserver and World Wide Web browser for a communication between server 136and clients 138 a and 138 b, other communications protocols will also besuitable. For example, client calls may be packaged directly as SQLstatements, without reliance on Web application 154 for a conversion toSQL. Clients may also query the database directly without using a clientbrowser.

[0055] When network 130 employs a World Wide Web server and clients, itmust support a TCP/IP protocol. Local networks such as this aresometimes referred to as “Intranets.” An advantage of such Intranets isthat they allows easy communication with public domain databasesresiding on the World Wide Web (e.g., the GenBank World Wide Web site).Thus, in a particular preferred embodiment of the present invention,clients 138 a and 138 b can directly access data (via Hypertext linksfor example) residing on Internet databases using a HTML interfaceprovided by Web browsers and Web server 156.

[0056] Bear in mind that if the contents of the local databases are toremain private, a firewall 140 must preserve in confidence the contentsof a sequence database 144 and a genomic database 146.

[0057] In a preferred embodiment, sequence database 144 is a flat filedatabase with a single file for genomic sequences from differentspecies. Other possible approaches may include partitioning the sequencedata according to different species or whether or not sequences havebeen found to be unique to the local database (i.e., sequences that didnot have any hits in an external database such as GenBank).

[0058] Preferably, the information in genomic database 146 is stored ina relational format. Such a relational database supports a set ofoperations defined by relational algebra It generally includes tablescomposed of columns and rows for the data contained in the database.Each table has a primary key, being any column or set of columns thevalues of which uniquely identify the rows in the table. The tables of arelational database may also include a foreign key, which is a column orset of columns the values of which match the primary key values ofanother table. A relational database is also generally subject to a setof operations (select, project, product, join and divide) which form thebasis of the relational algebra governing relations within the database.As noted above, relational databases are well known and documented (see,e.g., Nath, A., The Guide To SQL Serve, referenced above).

[0059] A relational database may be implemented in different ways. InOracle™ databases, for example, the various tables are not physicallyseparated, as there is one instance of work space with differentownership specified for different tables. In Sybase™ databases, incontrast, the tables may be physically segregated into different“databases.”

[0060] One specific configuration for network 130 for multiple usersprovides both the genomic and sequence databases on the same machine. Ifthere is a high volume of sequence searching, it may be desirable tohave a second processor of similar size and split the application acrossthe two machines to improve response time.

[0061] A suitable dual processor server machine may be any of thefollowing workstations: Sun—Ultra-Sparc 2™ (Sun Microsystems, Inc. ofMountain View, Calif.), SGI—Challenge L™ (Silicon Graphics, Inc. ofMountain View, Calif.), and DEC—2100A™ (Digital Electronics Corporationof Maynard, Mass.). Multiprocessor systems (minimum of 4 processors tostart) may include the following: Sun—Ultra Sparc Enterprise 4000™,SGI—Challenge XL™, and DEC—8400™. Preferably, the server machine isconfigured for network 130 and supports TCP/IP protocol.

[0062] Depending upon the workstation employed, the operating system maybe, for example, one of the following: Sun—Sun OS 5.5 (Solaris 2 5),SGI—IRIX 5 3 (or later), or DEC—Digital UNIX 3 2D (or later).

[0063] Databases used in conjunction with this invention may bedownloaded via a 4×4 Gb+FWSCSI-2, Fiber Link Raid Units 20 Gb+, or 4 DATTape Drive. A CD ROM drive may also be acceptable.

[0064] The client machine may be, for example, a Macintosh™ (AppleComputer Inc. of Cupertino, Calif.), a PC, or a Unix workstation Itshould also be TCP/IP capable with a Netscape or Internet Explorer WebBrowser.

[0065] The network may include a 10Base-T, 100. Base-T or higherconnection, be TCP/IP capable, and provide access to Internet for HTMLhyperlinks to external databases.

[0066]FIG. 1C illustrates the accessibility of graphical viewer featuresin accordance with a preferred embodiment of the present invention. Agraphical viewer in accordance with the present invention is preferablyprovided together with a suite of functions made available to usersthrough a collection of user interference screens (e.g., HAL or Java®pages) viewed in the user interface of a biomolecular relationaldatabase. Typically, the interface will have a main viewer page fromwhich various lines of query can be followed. In a preferred embodiment,the main viewer page (and other graphical viewers) are Java®-basedapplets running on the network system. Given the functionalitiesdescribed herein, one of ordinary skill in the art would be able toimplement the graphical viewers of the present invention in Java® orother programming environments. The viewer page is typically accessedfrom another page provided as part of the user interface of abiomolecular sequence relational database in connection with which thegraphical viewer is used.

[0067] For example, a user interface screen (e.g., HTML page) 170displays textual information relating to a plurality biomolecularsequences. One or more sequences displayed in the page 170 may beselected, for example, using the pointer provided in the GUI, to accessanother page 180 which displays additional information about theselected sequences. This page 180 may include a button which whenselected accesses a main graphical viewer page 190. The graphical viewerpage (e.g., Java® page) 190 graphically depicts information about theselected sequences. The page also preferably includes buttons 192 whichallow a user to modify the graphical display. The buttons 192 may alsoinclude buttons which a user may select to access additional graphicalviewer pages 194, 196, which graphically or otherwise display additionalinformation relating to the graphically displayed sequence informationin page 190.

[0068] Gene Locus Implementation

[0069] The invention will now be described with reference to aparticular preferred implementation of the invention to graphicallydepict gene locus information. The invention will be described withreference to a database optimized for microbial data, such as thatdescribed with reference to parent application Ser. No. 08/856,647,previously incorporated by reference. However, application of thepresent invention is by no means so limited. For example, the inventioncovers graphical viewers used in connection with databases optimized forother sources of biomolecular sequence data, such as animal sequences(e.g., human, primate, rodent, amphibian, insect, etc.) and plantsequences.

[0070] As noted above, a graphical viewer in accordance with the presentinvention is preferably provided together with a suite of functions madeavailable to users through a collection of user interface screens viewedin the user interface of a biomolecular relational database. A mainviewer page is typically accessed from another page provided as part ofthe user interface of a biomolecular sequence relational database inconnection with which the graphical viewer is used, in this case amicrobial genomic database. FIG. 2 depicts one such other page from themicrobial genomic database. The Contig Results page 200 displays a listof loci (identified by their LocusIDs) for genes localized to aparticular “contig” (a group of assembled overlapping sequences), contigSAU1c0039, of the genomic sequence of a microbial organism, in this caseStaphylococcus aureus.

[0071] By clicking on a particular LocusID in Contig Results page 200, auser accesses a Locus Information page, such as depicted in FIG. 3.Clicking on the LocusID SAU100241 in page 200, returns the LocusInformation page 300 which displays details about the locus SAU100241.The page also displays a Graphical Viewer button 302 which when selectedlaunches a graphical viewer in accordance with the present invention.

[0072]FIG. 4A depicts a main graphical viewer page 400 accessed byselecting the Graphical Viewer button 302 in Locus Information page 300.In this preferred embodiment, the graphical viewers are Java®-basedapplets that provide a graphical representation of a portion of a contigand its related loci. A graphical viewer in accordance with the presentinvention preferably includes a plurality of separate component viewers.Where more than one component viewer is featured it is preferablydisplayed in a single frame in order to enhance the effectiveness withwhich the graphically displayed data is conveyed to the user. Apreferred embodiment includes three component viewers displayed in asingle frame.

[0073] Thus, the graphical viewer 402 of page 400 has three viewercomponent panels on a single screen. The top panel 410 features a“legend viewer” 412, which shows the entire portion of the genome underconsideration. The middle panel 430 features a “target viewer” 432,which allows a user to focus (“zoom in”) on areas of the genome portionof particular interest The bottom panel 450 features a “sequence depthviewer” 452, which contains graphical information illustrating the depthof coverage over the length of the genome portion represented in thetarget viewer 422.

[0074] The graphical viewer page 400 also includes several buttons andwindows along the top 403 of the page 400 for accessing and displayingadditional information. A menu bar 404 is also provided for accessingpull-down menus listing various command and control functions. A scale415, 435, 455 depicted at the bottom of each viewer panel 410. The useof these features will be described in further detail below.

[0075] The legend viewer 412 always shows the entire portion of thecontig which was loaded by the viewer when the user selected a contig inthe previous screens. In a preferred embodiment, the viewer will load apredetermined default number of base pairs of the contig sequence. Ifthe contig is shorter than the default, the entire contig will bedepicted and the default will be adjusted. For example, in thisembodiment, the viewer loads 30,000 base pairs starting at the firstlocus in the list on the Contig Results screen 200 (identified by itsHit ID), g2462967. The number of base pairs shown and the position onthe contig may be determined with reference to the scale 415 depicted atthe bottom of the legend viewer panel 410. The default value may, ofcourse, be changed to any desired number.

[0076] The legend viewer 412 graphically represents contig SAU1c0039 asa line 414 which starts at coordinate (base pair number) 4467 andextends up to coordinate 34,467, as may be seen with reference to thescale 415. The contig depicted in the viewer is identified in a ContigIDwindow 407. In addition, the starting coordinate for the portion of thecontig depicted by the legend viewer 412 (namely, the startingcoordinate of the selected locus g2462967: 4467) is noted in the StartCoord window 405. These windows 405, 407 may also be used to enterinformation in order to control the information depicted by the viewer,as described further below. A user may bring upstream or downsteamportions of the contig into view in the legend viewer 412, and the othercomponent viewers, by clicking on the directional buttons 406.

[0077] In addition to the contig, the legend viewer 412 shows thevarious loci residing on the portion of the contig. The manner in whichthese loci are depicted in the power of a graphical viewer in accordancewith the present invention in presenting information in a highlyeffective manner.

[0078] The loci are represented by arrows 416. Each loci is locatedbeside the contig line 414 according to its position on the contig andthe direction in which it is read. The arrowhead represents thedirection in which a locus is read. Loci which are read in the forward(+) direction are depicted above the contig line 414. Loci which areread in the reverse (−) direction are depicted below the contig line414. In addition, other graphical features may be used to conveyinformation about the graphically depicted loci. For example, loci forwhich the sequences obtained are above an established confidencethreshold may be depicted as broken arrows.

[0079] In this preferred embodiment, the loci are also represented indifferent colors based on their protein's function. Proteins are groupedinto various functional categories, with each category being assigned acolor. For example, in this preferred embodiment, the proteinscorresponding to loci are grouped according to the followingcategories/colors: Motility/Light blue; Virulence/Red; Transport/LightGreen; Regulation/Magenta; Macromolecule metabolism/Yellow, Smallmolecule metabolism/Dark blue; Structure/Dark Green; andUnclassified/Black. Of course other categories and colors may also beused. These arrow and color representation features for loci are used inboth the legend viewer and the target viewer, discussed below.

[0080] The target viewer 432 initially displays the same scope as thelegend viewer 412. The scope of the target viewer may be modified,however, by clicking on the Zoom buttons 409. The Zoom In buttonprovides a closer view of a portion of the contig shown in the legendviewer 412. The closer view is depicted in the target viewer 432, withthe scale 435 adjusting to reflect the amount of the zoom. The Zoom Outbutton provides a broader view of the contig, up to the maximum of thedefault base pair number selected for the legend viewer (minimummagnification). The Zoom None button automatically returns to theminimum magnification.

[0081] Another way provided by a graphical viewer in accordance with thepresent invention to focus on a portion of interest of a contig 414depicted in the legend viewer 412 is to provide an outline, such as acolored (e.g., red) box, around the portion of the contig 414 which isshown in the target viewer 432. In this preferred embodiment, when a redbox surrounds the entire legend viewer panel, the target viewer alsodisplay the entire 30,000 base pairs. This is the situation illustratedin FIG. 4A. When the Zoom buttons 409 are used, as described above, thered box is adjusted accordingly.

[0082] An area on the contig may also be zoomed into by direct useradjustment of the red box (known as “rubber banding”). The scope of thered box may be changed by clicking at a location in any of the viewerpanels and dragging the cursor with a mouse to another location. The redbox will then encompass the region between those two points, and onlythis region will be visible in the target and sequence depth viewers.FIG. 4B depicts an updated page showing the viewer 402 after a user haszoomed in on the portion 434 of the contig 414 depicted in the legendviewer 412 between about the coordinates 14,200 and 18,200. The scale435 at the bottom of the target viewer 432 has been adjusted to reflectthe new scope of the zoomed target view.

[0083] Another feature of the target viewer is the loci are annotated.As may be seen in FIGS. 4A and 4B, annotations 436 are provided for lociarrows which are long enough to accommodate the annotation information.If a loci of interest is too short to be display its annotation, a usermay zoom in further on the locus until it is long enough to allow theannotation to be displayed in the graphical representation.

[0084] Individual loci in the target viewer 432 may be selected forfurther analysis by clicking on the graphically depicted locus. Aselected locus is highlighted in some manner, for example, by displayinga colored (e.g., red) box around its representation. Details about thislocus my be viewed by double-clicking on the locus representation.Double-clicking opens a Selected Object Details window, such as depictedin FIG. 5A. The Selected Object Details window 500 includes informationabout the locus, including its LocusID, gene (functional) category, basepair range, the sequence's homologous matches (preferably the number ofhomologous matches returned is limited to a preset number; for example,the top five matches are returned here) against other sequencedatabases, for example, the genpept database, and other informationuseful to researchers and relating to other features of the databasesystem with which the graphical viewer is used. Many of the fields ofinformation provided in the window 500 may be hyperlinks to other HTMLpages or other screens.

[0085] The Selected Object Details window 500 includes an Alignmentbutton 502. Clicking on this button accesses an alignment viewer whichprovides a graphical representation of the locus sequence and itshomologous matches. An example of an alignment viewer 510 in accordancewith a preferred embodiment of the present invention is shown in FIG.5B. The alignment viewer 510 has three panels. The top two panels 512and 514 provide a graphical representation of the locus identified inFIG. 5A (SAU101156). The third panel 516 provides graphicalrepresentations of the five homologs noted in FIG. 5A. The alignmentviewer page also includes a number of buttons 518 which may be used tocontrol the graphical representations. In particular, the page has Zoombuttons 520 which may be used to zoom into the sequence level of locidepicted in the lower two panels 514 and 516 (while the upper panel 512maintains the depiction of the entire locus). FIG. 5C illustrates thisZoom feature where the upper panel 512 has a colored box 522 around theportion of the locus depicted with its homologs at the sequence level inthe two lower panels 514 and 516. In this embodiment, the amino acidsequences are shown. In other embodiments, the corresponding nucleotidesequences may also be shown.

[0086] An additional feature of the graphical viewer page 400 thatbecomes useful when the scope of the view in the target viewer 432 isfocused in on a portion of the contig sequence shown in the legendviewer 412 is a scroll bar 470 at the bottom of the page. The scroll bar470 allows a user to move along the portion 434 of the contig 414 tobring upstream or downstream portions of the contig 414 into view in thetarget viewer 432.

[0087] The third panel 450 of the graphical viewer 402 in thisembodiment of the present invention is the sequence depth viewer 452.The sequence depth viewer 452 provides a graph illustrating the depth ofcoverage, that is, the number of times that a given portion of thecontig has been sequenced, over the length of the contig. The sequencedepth viewer 452 displays its graph for the contig or portion of thecontig displayed in the target viewer 432. Thus, in FIG. 4A, were thetarget viewer 432 and legend viewer 412 have the same scope, thesequence depth viewer 452 displays a graph showing the depth of coverageover the 30,000 base pairs of the contig 414 from coordinates 4467 to34,467, as indicated by the scale 455 at the bottom of the sequencedepth viewer panel 450. In FIG. 4B, however, the sequence depth viewer452 displays a graph showing the depth of coverage over theapproximately 4000 base pairs of the portion 434 of the contig zoomed inon in the target viewer from about coordinates 14,200 to 18,200, asindicated by the adjusted scale 455. The sequence depth viewer alsoincludes a second scale 456 on the y-axis indicating the number ofsequencing passes represented by the graph.

[0088] The manner in which this depth of coverage information isdepicted provides a further illustration of the power of a graphicalviewer in accordance with the present invention in presentinginformation in a highly effective manner. A user of the graphical vieweris able to very quickly, at a glance, assimilate useful informationrelating to the confidence to be attributed to the sequence informationdepicted in the other panels of the viewer. In this preferred embodimentof the present invention, the sequence depth viewer 452 depicts coverageas a sequence distribution graph 454. A particular advantage of this wayof depicting of the depth of coverage information is that it isparticularly effective for clearly providing this information in agraphical format which makes a clear visual impression and renders thedata easily quantifiable, with reference to the y-axis scale 456. Thecoverage data for various regions is also easily compared in thisformat.

[0089] In other embodiments of the invention, a sequence depth viewermay graphically depict depth of coverage information in other ways. Forexample, the actual sequences from which the contig was assembled may bedepicted. This way of depicting the sequence coverage information mayprovide useful information for some users who are concerned with thedata acquisition process, for example, used in the contig's formation.

[0090] As noted above, the graphical viewer page 400 includes severalbuttons and windows along the top 403 of the page 400 for accessing anddisplaying additional information. Several of these have already beendiscussed, including the Start Coord 405 and ContigID 407 windows. FIGS.6 and 7 illustrate additional features of a this embodiment of agraphical viewer in accordance with the present invention.

[0091] In addition to displaying the start coordinate for the contigsequence displayed in the legend viewer 612, the Start Coord window 605may receive an entry from a user of a different starting coordinate. Theentry of a different start coordinate will bring a different portion ofa contig's sequence into view in the legend viewer. For example, FIG. 6shows a graphical viewer page 600 with the same settings as page 400,except that 0 has been entered in the Start Coord window 605. As aresult, the contig sequence 602 and associated loci 604 shown in thelegend viewer 612 is shifted 4467 base pairs upstream to the beginningof contig SAU1c0039. The 4467-most downs base pairs in the depiction ofthe contig 414 in FIG. 4 are no longer visible in the viewer of page600. The corresponding views are also depicted by the target viewer 632and the sequence depth viewer 652.

[0092] Also, in addition to identifying the contig depicted in theviewer 402, the ContigID window 407 may receive an entry from a user ofa different ContigID. The entry of a different ContigID will cause thedefault number of base pairs (preferably starting from the coordinate 0)of the contig sequence associated with the new ContigID to be loadedfrom the database associated with the viewer and displayed. For example,FIG. 7 shows a graphical viewer page 700 with the ContigID SAU1c0016entered in the ContigID window 707. As a result, the contig sequence 702and associated loci 704 shown in the legend viewer 712 are that forcontig SAU1c0016. The corresponding views are also depicted by thetarget viewer 732 and the sequence depth viewer 752.

[0093] As also noted above, the graphical viewer 400 includes a menu bar404 for accessing pull-down menus listing various command and controlfunctions. The File pull-down menu lists standard commands found inapplications software packages such as save and print, etc. The Editpull-down menu provides a list of categories for editing the parametersof the graphical viewers, including the default contig sequence lengthdisplay number and the colors used to represent various features in theviewers.

[0094] Of particular interest is the View pull-down menu which, togetherwith allowing the user to select which features should be included inthe various viewer displays, also includes a View Sequences on Coords804 option. A graphical viewer page 800 is shown in FIG. 8A with theView pull-down menu 802 selected. Selection of the View Sequences onCoords 804 option from the menu 802 accesses a page 810 listing thesequences used to assemble the contig depicted in the graphical viewer402, together with the coordinates of each sequence which indicate itscoverage. Selecting a sequence from the list, such as the second one inthe list, 806503054F1 (5201,5690) 812, and clicking the SequenceDatabase button 814 accesses a database of the raw sequences used toassemble contigs in the database system associated with the graphicalviewer and returns a Sequence Retrieval Results page 820, depicted inFIG. 8C. The Sequence Retrieval Results page 820 depicts the actualnucleotide sequence 822 of the sequence 812 selected in FIG. 8B.

[0095] A generalized process by which a graphical viewer system inaccordance with a preferred embodiment of the present invention returnsgraphical representations of gene locus information to a user isdepicted in FIG. 9. This process flow shows only some of the mainfeatures of a preferred embodiment of the present invention in order toill in process flow form some of the options for graphically displayingsequence data in accordance with an embodiment of the present invention.It is not intended to provide a comprehensive depiction of the presentinvention.

[0096] The process 900 begins at 901 and at a step 902 data for aselected locus and its associated contig are loaded into the graphicalviewer. As noted above, the locus may be selected from a list in a HTMLpage provided as part of the user interface of a biomolecular sequencerelational database in connection with which the graphical viewer isused, in this case a microbial genomic database. At a step 904, agraphical display of the selected locus on its contig is provided.Preferably, the graphical display has a plurality of components forrepresenting different aspects of the biomolecular sequence dataassociated with the selected locus. In a particularly preferredembodiment depicted in FIGS. 4A and 4B and described above, thegraphical representation is a viewer having three components: a legendviewer, a target viewer, and a sequence depth viewer.

[0097] If no further entries or zoom adjustments are made, the processmay end at step 940 following the graphical display at step 904.However, a user may want to use the graphical viewer to extract anddisplay additional information relating to the selected locus or otherloci, and the viewer provides additional functionalities for thispurpose.

[0098] The graphical representation of the data displayed by thegraphical viewer may be modified in a variety of ways. Also, additionalinformation may be accessed by selecting various objects (namely, loci)in a viewer. For example, a user may enter a new contigID in a fieldprovided in a graphical viewer page, such as window 407 in FIG. 4A. Ifso, decision step 906 is answered in the affirmative and the new contigand its loci are graphically depicted in a viewer at a step 914. A usermay also enter a new start coordinate, such as in Start Coord window 405in FIG. 4A. If so, decision step 908 is answered in the affirmative andthe graphical display is adjusted to show the contig in the newcoordinate range at a step 916. In addition, as described above, a usermay choose to focus in on a particular portion of a graphically depictedcontig. If so, decision step 910 is answered in the affirmative and thegraphical display in the target viewer, in this embodiment, is adjustedto show the contig in the zoomed view at a step 918. If any of thesedecision steps are answered in the negative the graphical viewer displayremains unchanged.

[0099] After any of these actions, or in the alternative, a user mayselect an object to obtain further information. In a preferredembodiment, the loci depicted in a target viewer component of thegraphical viewer may be selected by clicking on its representation. Ifso, decision step 920 is answered in the affirmative and the depictionof the locus in the target viewer may be highlighted with a colored box.If a user wishes to obtain detailed information about the selected loci,the user may do so by double clicking on the depiction of that locus. Ifso, decision step 921 is answered in the affirmative and a Java® pageshowing detailed information about the selected locus is shown at a step922.

[0100] Another feature of this aspect of a preferred embodiment of thepresent invention is a graphical alignment viewer, as described above. Auser may elect to display a graphical viewer which shows the alignmentof the amino acid sequence of the loci of interest against somehomologous sequences. If so, decision step 924 is answered in theaffirmative and the alignment is graphically displayed in a graphicalviewer at a step 926.

[0101] A user may also be provided with the option of displaying amultiple organism viewer to view graphical representations of homologousand paralogous loci of the locus of interest For example, if decisionstep 920 is answered in the affirmative, a multiple organism viewer maybe accessed at a step 930 when a decision step 928 is answered in theaffirmative. Further details of the operation of a multiple organismviewer in accordance with a preferred embodiment of the presentinvention are described below with reference to FIGS. 10A-10E and 11.

[0102] Of course the selected object details and multiple organismselection decisions are independent of each other and could just aseasily have been presented in other ways in FIG. 9. Further, it shouldbe noted that the system allows the user to exit from the graphicalviewer mode at any time. This option is not depicted in FIG. 9.

[0103] A further option available for accessing further information froma graphical viewer in accordance with the present invention is thedisplay of actual nucleotide or amino acid sequences for a selectedsequence associated with the locus of interest and its contig. In apreferred embodiment, a user may choose this option by clicking on abutton in a graphical viewer page such as depicted in FIG. 4A. If so,decision step 912 is answered in the affirmative and a list of sequencessequence identifiers) and coordinates for the sequences from which thecontig displayed in the viewer was assembled is displayed at a step 932.A user may then select a sequence from the list If so, decision step 934is answered in the affirmative and the actual nucleotide sequence (inthis case) of the selected sequence is displayed. The process ends at940.

[0104] As with other data displayed in graphical viewers in accordancewith the present invention, the data used in this aspect of theinvention is obtained from an associated biomolecular sequence databaseand system. The organization and operation of such systems may vary.Examples are provided in the Incyte Pharmaceuticals applicationspreviously incorporated by reference herein. Given the description ofthe functionality and displays herein, one of skill in the art would beable to implement the graphical viewer of the present invention in anysuch system.

[0105] Multiple Organism Viewer

[0106] As noted above, a graphical viewer in accordance with the presentinvention may also provide for the graphical representation andcomparison of multiple portions of the genome of one or more organismsbased on a locus of interest and its corresponding paralogs (relatedloci from other portions of an organism's genome) and homologs (relatedloci from another organism's genome). A preferred embodiment of such amultiple organic viewer is described with reference to FIGS. 10A-10D,below.

[0107]FIG. 10A depicts a main graphical viewer page 1000, like thatshown in FIGS. 4A and 4B. In FIG. 10A, a box (“rubber band”) 1002 hasbeen placed around a region 1004 of the portion of the contig 1006displayed by the legend viewer 1008 component of the graphical viewer1010. This region 1004 of the contig 1006 is displayed by the targetviewer 1020 component of the graphical viewer 1010, and its coverage isdepicted by the sequence depth viewer 1030 component In the targetviewer 1020, a box 1022 around locus SAU100242 indicates that that locushas been selected. As noted previously, the main viewer page 1000includes a Multiple Organisms button 1001.

[0108] Clicking on the Multiple Organisms button 1001 when a locus hasbeen selected in the target viewer retrieves from the databaseassociated with the viewer and displays a list of all librariescontaining homologs and paralogs of the selected locus. FIG. 10B depictsa window 1040 showing a list of libraries retrieved for the locusSAU100242 selected in the previous page shown in FIG. 10A. To access alist of individual homologs and paralogs, a user may select one or morelibraries in the list displayed in this window 1040. Clicking on theMultiple Organisms button 1042 retrieves the individual homologs and/orparalogs and displays them. FIG. 10C depicts an example of a window 1045showing a list of homologs and paralogs for the locus SAU100242 from thelibraries selected in screen 1040 shown in FIG. 10B. A hit descriptionfor each locus is also provided.

[0109] A user may then choose to produce a graphical display of theoriginally selected locus (e.g. SAU100242) and the selected homologousand paralogous loci displayed in the list of FIG. 10C. By clicking onthe Multiple Organisms button 1046 in window 1045, the locus of interestand its homologs and paralogs are loaded into a multiple organism viewerin accordance with a preferred embodiment of the present invention, andthe locus of interest and the selected homologs and paralogs aredisplayed. FIG. 10D depicts an example of such a multiple organismviewer page 1050.

[0110] The multiple organism viewer page 1050 provides a single panelmultiple organism viewer 1052 graphically depicting the selected locusof interest (SAU100242) on its contig (SAU1c0039) and the selectedhomologous and paralogous loci on their respective contigs. FIG. 10D,shows a viewer 1052 graphically displaying five (5) contigs in a singlepage: SAU1c0039 1061, PRT1c0129 1062, SAU2c0391 1063, SEP1c0220 1064,and SHA1c0122 1065. Contig 1061 is shown together with its lociincluding the selected locus SAU100242 1071 depicted in bold anditalicized in order to more clearly identify it In the embodimentdepicted in FIG. 10D, the loci are annotated with a hit descriptionrather than a Locus ID. Each of the other contigs is also depicted withits loci alongside, and with the loci homologous to SAU100242(respectively, loci 1072, 1073, 1074 and 1075) shown in bold italics.

[0111] The multiple organism viewer 1052 illustrates another example ofthe power of a graphical viewer in accordance with the present inventionto convey biomolecular sequence information in an effective way. Asnoted above, the selected locus and its homologous and paralogous locimay be shown bold and italicized, or in other type, such as a particularcolor, in order to distinguish them as the loci of interest As may alsobe seen in FIG. 10D, the loci of interest for the graphically displayedcontigs are aligned in the page 1052 so that a visual comparison ofadjacent loci on the various contigs is easily achieved. This visualrepresentation may be further enhanced through use of the complementfeature described below with reference to FIG. 10E.

[0112] Further features of such a graphical viewer in accordance withthis embodiment of the invention may be accessed by clicking onpull-down menu selections 1053 provided in the multiple organism viewerpage 1050. The menu selections include File, View and Help selectionsthat provide features such as described above with reference to FIG. 4A.The Show selection 1054 accesses a list of all of the loci listed in thewindow illustrated in FIG. 10C and loaded into the multiple organismviewer. By selecting a locus from the Show pull-down menu, a user maydetermine that the locus along with the contig on which it resides willbe displayed or hidden, clicking on the loci, a user may determine thata locus will be displayed or hidden. The Show menu may also provide forthe same determination to be made with respect to the contigs.

[0113] The Complement menu selection 1055 allows a user to manipulatethe graphical representations of the contigs and loci in order tofacilitate the extraction of salient information from the data Inparticular, the complement menu selection 1055 allows the user toperform a reverse complement on any of the contigs displayed in themultiple organism viewer 1052. In this way, the homologous andparalogous loci displayed in the viewer 1052 may be shown in the samereading direction so that a user may more easily locate patterns ofrelated loci adjacent to the loci of interest FIG. 10E depicts amultiple organism viewer page 1080 in which the loci of interestdepicted in page 1050 are shown with the same reading direction by useof the complement feature to show the reverse complement of contigs1062, 1063 and 1065.

[0114] Shortcuts for the complement feature, as well as other featuresdescribed herein, may be made available to a user according to methodswell known to those of skill in the art. For example, the complement ofa locus (contig) may be shown by holding down the shift key on akeyboard used to interface with the computer system on which thegraphical viewer in operating while clicking on the contig.

[0115]FIG. 11 depicts a flow chart for a generalized process of theoperation a multiple organism viewer in accordance with a preferredembodiment of the present invention. The process 1100 starts at 1101,and at a step 1102 the multiple organism viewer system receives aselection of a locus of interest, for example by clicking on a locus inthe target viewer of FIG. 10A. At a step 1104, a list of librariescontaining loci homologous or paralogous to the selected locus ofinterest is displayed in a window. This display may be initiated by auser clicking on a button, such as the Multiple Organisms button in FIG.10A. Next, the system receives a selection of one or more libraries fromthe list at a step 1106, and at a step 1108 a list of loci from theselected libraries which are homologous, or paralogous to the selectedlocus of interest is displayed in a window. At a step 1110, the systemreceives a selection of loci homologous or paralogous to the selectedlocus of interest from the list to be displayed. Then, at a step 1112,the selected loci and their respective contigs are graphically displayedin a multiple organism graphical viewer. In a preferred embodiment, theviewer shows all of the contigs and loci in a single panel in order tofacilitate comparison of the graphically depicted data The process endsat 1114.

[0116] Implementation

[0117] It is important to note that the present invention may beimplemented as a system or a method, and may be embodied on a variety ofcomputer-readable media that include program instructions, etc. forperforming various operations described herein. As noted above, thesystem implementation is preferably in association with a biomolecularsequence relational database system. The method is acomputer-implemented method, generally involving the operation of such asystem. The media may be any computer-readable media Examples ofcomputer-readable media include, but are not limited to, magnetic mediasuch as hard disks, floppy disks, and magnetic tape; optical media suchas CD-ROM disks; magneto-optical media such as floptical disks; andhardware devices that are specially configured to store and performprogram instructions, such as read-only memory devices (ROM) and randomaccess memory (RAM). The invention may also be embodied in a carrierwave travelling over an appropriate medium such as airwaves, opticallines, electric lines, etc.

CONCLUSION

[0118] Although the foregoing invention has been described in somedetail for purposes of clarity of understanding, it will be apparentthat certain changes and modifications may be practiced within the scopeof the appended claims. It should be noted that there are manyalternative ways of implementing methods, media and systems of thepresent invention. As noted previously, the scope of the invention isnot limited to use with a microbial genomic database system such as thatin connection with which the invention is primarily described above.Given the description provided herein, one of skill in the art wouldunderstand how to use the present invention in connection with a varietyof computer-based biomolecular sequence database systems. For example, agraphical viewer in accordance with the present invention may be used inconnection with database systems employed to store and analyze othertypes and forms of nucleic acid sequences or expressed nucleic acid oramino acid sequences. Accordingly, the present embodiments are to beconsidered as illustrative and not restrictive, and the invention is notto be limited to the details given herein, but may be modified withinthe scope and equivalents of the appended claims.

What is claimed is:
 1. A method implemented in a computer system forpresenting biomolecular sequence data, comprising: retrievingbiomolecular sequence data from a database in response to a user query;and graphically depicting elements of the biomolecular sequence data ina user interface for said computer system.
 2. The method of claim 1,wherein said graphical depiction comprises a plurality of panels.
 3. Themethod of claim 2, wherein said plurality of panels are comprised withina single frame.
 4. The method of claim 3, wherein said plurality ofpanels provide graphical depictions representing different aspects ofsaid biomolecular sequence data.
 5. The method of claim 4, wherein saidbiomolecular sequence data comprises gene locus data.
 6. The method ofclaim 5, wherein said plurality of panels comprises three panels.
 7. Themethod of claim 6, wherein said three panels comprise a first panelgraphically depicting at least a portion of a contig and its associatedloci, a second panel graphically depicting at least a portion of thecontig depicted in said first panel and annotated loci associated withthe portion, and a third panel graphically depicting informationindicating the number of sequencing operations conducted to determinethe sequence data depicted in the second panel.
 8. The method of claim7, wherein said third panel graphically depicts sequences used toassemble the portion of the contig depicted in the second panel.
 9. Themethod of claim 7, wherein said third panel graphically depicts depth ofcoverage information for the portion of the contig depicted in thesecond panel.
 10. The method of claim 1, wherein said method isimplemented in Java programming language.
 11. A method implemented in acomputer system for presenting biomolecular sequence data, comprising:retrieving biomolecular sequence data for a plurality of homologous locifrom a database in response to a user query; and graphically depictingat least some of the homologous loci in a user interface for saidcomputer system.
 12. The method of claim 11, wherein said graphicaldepiction comprises a single panel.
 13. A computer system, comprising: adatabase including biomolecular sequence data; a user interface capableof receiving a query relating to the biomolecular sequence data, andgraphically displaying the results of said query.
 14. The system ofclaim 13, wherein said graphical depiction comprises a plurality ofpanels.
 15. The system of claim 14, wherein said plurality of panels arecomprised within a single frame.
 16. The system of claim 15, whereinsaid of panels provide graphical depictions representing differentaspects of said biomolecular sequence data.
 17. The system of claim 16,wherein said biomolecular sequence data comprises gene locus data. 18.The system of claim 17, wherein said gene locus data is depicted inthree panels comprising a first panel graphically depicting at least aportion of a contig and its associated loci a second panel graphicallydepicting at least a portion of the contig depicted in said first paneland annotated loci associated with the portion, and a third panelgraphically depicting information indicating the number of sequencingoperations conducted to determine the sequence data depicted in thesecond panel.
 19. The system of claim 18, wherein said third panelgraphically depicts depth of coverage information for the portion of thecontig depicted in the second panel.
 20. A computer-readable mediumcontaining programmed instructions arranged to graphically displaybiomolecular sequence data, the computer-readable medium includingprogrammed instructions for: retrieving biomolecular sequence data froma computer system database in response to a user query; and graphicallydepicting elements of the biomolecular sequence data in a user interfacefor the computer system.